Search CORE

472 research outputs found

Twinscan: A Software Package for Homology-Based Gene Prediction

Author: Flicek Paul
Publication venue: Washington University Open Scholarship
Publication date: 14/02/2003
Field of study

A complete mapping from genome to proteome would constitute a foundation for genome-based biology and provide targets for pharmaceutical and therapeutic intervention. This is one reason gene structure prediction has been a major subfield of computational biology for over 20 years. Many of the widely used gene prediction systems were developed in the 1990s and are unable to take advantage of the revolution in comparative genomics brought on by the sequencing of the entire genomes of an increasing numbers of vertebrates. Twinscan is a new system for high-throughput gene-structure prediction that exploits the patterns of conservation observed in alignments between a target genomic sequence and its homologous sequence in other organisms. The approach employs a symbolic conservation sequence that effectively combines many local alignments into a single global alignment. This has several important properties that make Twinscan particularly useful for high-throughput gene prediction. For mammals, Twinscan has been shown to be significantly more accurate and reliable by all measures than any non-comparative genomic method. Twinscan is based on, and includes as a component, the same hidden Markov model topology as Genscan, a popular non-homology based gene prediction program. Twinscan has an object-oriented design and is implemented in the C++ programming language. Twinscan’s three major components consist of probabilistic models of both the DNA sequence and the conservation sequence as well as a dynamic programming framework. Both the models and the computational structure are complicated aggregate classes. In this report, the design and implementation of Twinscan is described at the source-code level for the first time

Washington University St. Louis: Open Scholarship

Chapter Functional Annotation of Rare Genetic Variants

Author: Flicek Paul
Ritchie Graham
Publication venue: Springer Nature
Publication date: 02/06/2021
Field of study

Genome-wide association studies have successfully identified a growing number of common variants that robustly associate with a wide range of complex diseases and phenotypes. In the majority of cases though, the variants are predicted to have small to modest effect sizes, and, due to the technologies used, many of the signals discovered so far may not be the causal loci. As rare variation studies begin to explore the lower ranges of the allele frequency spectrum, using whole genome or whole exome sequencing to capture a larger proportion of variants, we expect to find variants with a more direct causal role in the phenotype(s) of interest. Interpreting possible functional mechanisms linking variants with phenotypes will become increasingly important

Directory of Open Access Books (DOAB)

Using several pair-wise informant sequences for de novo prediction of alternatively spliced transcripts

Author: Brent Michael R
Flicek Paul
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: As part of the ENCODE Genome Annotation Assessment Project (EGASP), we developed the MARS extension to the Twinscan algorithm. MARS is designed to find human alternatively spliced transcripts that are conserved in only one or a limited number of extant species. MARS is able to use an arbitrary number of informant sequences and predicts a number of alternative transcripts at each gene locus. RESULTS: MARS uses the mouse, rat, dog, opossum, chicken, and frog genome sequences as pairwise informant sources for Twinscan and combines the resulting transcript predictions into genes based on coding (CDS) region overlap. Based on the EGASP assessment, MARS is one of the more accurate dual-genome prediction programs. Compared to the GENCODE annotation, we find that predictive sensitivity increases, while specificity decreases, as more informant species are used. MARS correctly predicts alternatively spliced transcripts for 11 of the 236 multi-exon GENCODE genes that are alternatively spliced in the coding region of their transcripts. For these genes a total of 24 correct transcripts are predicted. CONCLUSION: The MARS algorithm is able to predict alternatively spliced transcripts without the use of expressed sequence information, although the number of loci in which multiple predicted transcripts match multiple alternatively spliced transcripts in the GENCODE annotation is relatively small

Springer - Publisher Connector

PubMed Central

Uncovering information on expression of natural antisense transcripts in Affymetrix MOE430 datasets

Author: Flicek Paul
Lang Roland
Mages Joerg
Oeder Sebastian
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The function and significance of the widespread expression of natural antisense transcripts (NATs) is largely unknown. The ability to quantitatively assess changes in NAT expression for many different transcripts in multiple samples would facilitate our understanding of this relatively new class of RNA molecules. Results Here, we demonstrate that standard expression analysis Affymetrix MOE430 and HG-U133 GeneChips contain hundreds of probe sets that detect NATs. Probe sets carrying a "Negative Strand Matching Probes" annotation in NetAffx were validated using Ensembl by manual and automated approaches. More than 50 % of the 1,113 probe sets with "Negative Strand Matching Probes" on the MOE430 2.0 GeneChip were confirmed as detecting NATs. Expression of selected antisense transcripts as indicated by Affymetrix data was confirmed using strand-specific RT-PCR. Thus, Affymetrix datasets can be mined to reveal information about the regulated expression of a considerable number of NATs. In a correlation analysis of 179 sense-antisense (SAS) probe set pairs using publicly available data from 1637 MOE430 2.0 GeneChips a significant number of SAS transcript pairs were found to be positively correlated. Conclusion Standard expression analysis Affymetrix GeneChips can be used to measure many different NATs. The large amount of samples deposited in microarray databases represents a valuable resource for a quantitative analysis of NAT expression and regulation in different cells, tissues and biological conditions.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Consistent annotation of gene expression arrays

Author: Ballester Benoît
Flicek Paul
Johnson Nathan
Proctor Glenn
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Gene expression arrays are valuable and widely used tools for biomedical research. Today's commercial arrays attempt to measure the expression level of all of the genes in the genome. Effectively translating the results from the microarray into a biological interpretation requires an accurate mapping between the probesets on the array and the genes that they are targeting. Although major array manufacturers provide annotations of their gene expression arrays, the methods used by various manufacturers are different and the annotations are difficult to keep up to date in the rapidly changing world of biological sequence databases. Results We have created a consistent microarray annotation protocol applicable to all of the major array manufacturers. We constantly keep our annotations updated with the latest Ensembl Gene predictions, and thus cross-referenced with a large number of external biomedical sequence database identifiers. We show that these annotations are accurate and address in detail reasons for the minority of probesets that cannot be annotated. Annotations are publicly accessible through the Ensembl Genome Browser and programmatically through the Ensembl Application Programming Interface. They are also seamlessly integrated into the BioMart data-mining tool and the biomaRt package of BioConductor. Conclusions Consistent, accurate and updated gene expression array annotations remain critical for biological research. Our annotations facilitate accurate biological interpretation of gene expression profiles.</p

Crossref

Springer - Publisher Connector

HAL AMU

Directory of Open Access Journals

PubMed Central

Considerations for the inclusion of 2x mammalian genomes in phylogenetic analyses

Author: Birney Ewan
Flicek Paul
Herrero Javier
Vilella Albert J
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Comment on Milinkovitch et al.: http://genomebiology.com/2010/11/2/R1

PubMed Central

UCL Discovery

DIAL UCLouvain

Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor

Author: Bethan Pritchard
Chen
Daniel Rios
Fiona Cunningham
Flicek
Karchin
Paul Flicek
Rios
Sherry
William McLaren
Yuan Chen
Publication venue: Oxford University Press
Publication date
Field of study

Summary: A tool to predict the effect that newly discovered genomic variants have on known transcripts is indispensible in prioritizing and categorizing such variants. In Ensembl, a web-based tool (the SNP Effect Predictor) and API interface can now functionally annotate variants in all Ensembl and Ensembl Genomes supported species

Crossref

PubMed Central

Mitochondrial heteroplasmy in vertebrates using ChIP-sequencing data.

Author: Flicek Paul
Horvath Julie
Odom Duncan T
Rensch Thomas
Villar Diego
Publication venue: Genome Biol
Publication date: 01/01/2016
Field of study

BACKGROUND: Mitochondrial heteroplasmy, the presence of more than one mitochondrial DNA (mtDNA) variant in a cell or individual, is not as uncommon as previously thought. It is mostly due to the high mutation rate of the mtDNA and limited repair mechanisms present in the mitochondrion. Motivated by mitochondrial diseases, much focus has been placed into studying this phenomenon in human samples and in medical contexts. To place these results in an evolutionary context and to explore general principles of heteroplasmy, we describe an integrated cross-species evaluation of heteroplasmy in mammals that exploits previously reported NGS data. Focusing on ChIP-seq experiments, we developed a novel approach to detect heteroplasmy from the concomitant mitochondrial DNA fraction sequenced in these experiments. RESULTS: We first demonstrate that the sequencing coverage of mtDNA in ChIP-seq experiments is sufficient for heteroplasmy detection. We then describe a novel detection method for accurate detection of heteroplasmies, which also accounts for the error rate of NGS technology. Applying this method to 79 individuals from 16 species resulted in 107 heteroplasmic positions present in a total of 45 individuals. Further analysis revealed that the majority of detected heteroplasmies occur in intergenic regions. CONCLUSION: In addition to documenting the prevalence of mtDNA in ChIP-seq data, the results of our mitochondrial heteroplasmy detection method suggest that mitochondrial heteroplasmies identified across vertebrates share similar characteristics as found for human heteroplasmies. Although largely consistent with previous studies in individual vertebrates, our integrated cross-species analysis provides valuable insights into the evolutionary dynamics of mitochondrial heteroplasmy

Springer - Publisher Connector

PubMed Central

Apollo (Cambridge)

Recommended from our members

Complexity and conservation of regulatory landscapes underlie evolutionary resilience of mammalian gene expression.

Author: Berthelot Camille
Flicek Paul
Horvath Julie E
Odom Duncan T
Villar Diego
Publication venue: Nat Ecol Evol
Publication date: 08/04/2017
Field of study

To gain insight into how mammalian gene expression is controlled by rapidly evolving regulatory elements, we jointly analysed promoter and enhancer activity with downstream transcription levels in liver samples from 15 species. Genes associated with complex regulatory landscapes generally exhibit high expression levels that remain evolutionarily stable. While the number of regulatory elements is the key driver of transcriptional output and resilience, regulatory conservation matters: elements active across mammals most effectively stabilize gene expression. In contrast, recently evolved enhancers typically contribute weakly, consistent with their high evolutionary plasticity. These effects are observed across the entire mammalian clade and are robust to potential confounders, such as the gene expression level. Using liver as a representative somatic tissue, our results illuminate how the evolutionary stability of gene expression is profoundly entwined with both the number and conservation of surrounding promoters and enhancers

Apollo (Cambridge)